BUG: read_excel for ods files raising UnboundLocalError in certain cases #36175

asishm · 2020-09-06T22:58:37Z

closes Regression in pandas/io/excel/_odfreader.py (UnboundLocalError: local variable 'spaces' referenced before assignment) #36122
closes BUG: Was trying to read an ods file and ran into UnboundLocalError in odfreader.py #35802
tests added / passed
passes black pandas
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

Need some guidance on tests here. Do we need tests for other file formats here as well? Also unsure about naming conventions.

The primary bug in this case (apart from the indentation problem in the original code) is that the code was ignoring cases where the XML of the cell had multiple child nodes that were not spaces

reverted implementation from #33233 incorporating part of its modifications.

jreback · 2020-09-07T20:36:55Z

pandas/io/excel/_odfreader.py

        super().__init__(filepath_or_buffer, storage_options=storage_options)

+    def _monkeypatch_odf_element_str(self):


woa, why don't you just change the impl to be correct? we never want to monkeypatch things in library code.

I agree that we should not be monkeypatching the library code and ideally this should be fixed in odfpy directly.

If you look at the fix it essentially does the exact thing that the library does https://github.com/eea/odfpy/blob/master/odf/element.py#L240-L244 except add the case that PR #33233 addresses. This is relevant mostly because these Nodes can get specially if the nested child nodes also contain multiple spaces that #33233 addresses. Then it would become a recursive implementation.

For a cell like this:

there is no way to get all the spaces without recursive (or equivalent). The original PR is also somewhat of a monkeypatch as it essentially replaced str(cell) with _get_cell_string_value(cell)

right why can't you just change _get_string_cell_value to what you are doing here? (and put a nice comment / link to the source & issue)

got it. done

not sure why the tests are failing. they seem unrelated.

WillAyd · 2020-09-09T03:51:38Z

Looks good - can you add a what’s new note for 1.2 bug fix section?

asishm · 2020-09-09T14:39:40Z

@WillAyd added what's new note

WillAyd · 2020-09-09T15:19:44Z

pandas/io/excel/_odfreader.py

+                    # https://github.com/pandas-dev/pandas/pull/36175#discussion_r484639704
+                    value.append(self._get_cell_string_value(fragment))
+            else:
+                value.append(str(fragment))


Is there any reason to call str(fragment) instead of always just calling self._get_cell_string_value(fragment)?

then we'd need to handle the str method for all the different cases which would require going into the spec. odfpy already has the str implementations done. the only case that was problematic was the multiple spaces that was addressed in the previous linked PR.

WillAyd · 2020-09-10T19:16:26Z

pandas/io/excel/_odfreader.py

                    value.append(" " * spaces)
+                else:
+                    # recursive impl needed in case of nested fragments


As a reader it isn't clear to me what this means; is this a bug that needs to be fixed upstream?

before #33233, the value was str(cell) which relied on the upstream odfpy implementation.

#33233 attempted to fix cases where multiple spaces was getting skipped in the output. That is because odfpy's str implementations for Nodes/Elements that are of that specific type (from odf.text import S) does not include the actual number of spaces. This imo should have been addressed upstream.

#33233 also introduced the bug that is causing the current UnboundLocalError as one line is misaligned. But fixing the misalignment adds in more bugs. With just the indentation fix, the output will still be missing certain fragments from the cell (for the file in #36122).

This was my original reason for doing a monkeypatch that patched __str__ implementation of the Element nodes to include the S spaces Node. I changed the monkeypatch but instead changed the implementation of _get_cell_string_value to have the same behavior as the monkeypatch. It still relies on odfpy's str implementation for all other Element types except for the Space Node, otherwise it pretty much mirrors Element.__str__ from odfpy.

Sorry for the wall of text.

jreback

minor comment, but lgtm.

jreback · 2020-09-11T12:33:50Z

doc/source/whatsnew/v1.2.0.rst

@@ -296,6 +296,7 @@ I/O
 - :meth:`to_csv` did not support zip compression for binary file object not having a filename (:issue: `35058`)
 - :meth:`to_csv` and :meth:`read_csv` did not honor `compression` and `encoding` for path-like objects that are internally converted to file-like objects (:issue:`35677`, :issue:`26124`, and :issue:`32392`)
 - :meth:`to_picke` and :meth:`read_pickle` did not support compression for file-objects (:issue:`26237`, :issue:`29054`, and :issue:`29570`)
+- Bug in :meth:`read_excel` with `engine="odf"` caused UnboundLocalError in some cases where cells had nested child nodes (:issue:`36122`, and :issue:`35802`)


use double back ticks on UnboundLocalError

jreback · 2020-09-13T22:57:50Z

thanks @asishm

simonjayhawkins · 2020-09-14T07:36:09Z

from #36122 (comment)

This used to work fine (one month ago on my machine). I tried upgrading to Pandas 1.1.1 and get the same bug. It looks like what's below:

could this fix go in 1.1.3

jreback · 2020-09-14T10:57:42Z

yep could backport

simonjayhawkins · 2020-09-14T11:27:45Z

ok will raise PR on master first to move release note and then do a manual backport.

can't auto backport this (followed by release note move) as no doc/source/whatsnew/v1.2.0.rst on 1.1.x

simonjayhawkins · 2020-09-14T11:32:59Z

ok will raise PR on master first to move release note and then do a manual backport.

second thoughts will do the backport first and then if all ok move release note on master to be in sync

…nboundLocalError in certain cases

…lError in certain cases (#36355) Co-authored-by: Asish Mahapatra <[email protected]>

…6175 (pt1)

…ses (pandas-dev#36175)

asishm added 5 commits September 5, 2020 02:00

ods monkeypatch

3fcbc01

add test - ods only

3738fd1

add files to tests

c4edbd2

Merge branch 'master' of github.com:pandas-dev/pandas into ods

bfd6069

flake8

9f77b56

jreback requested changes Sep 7, 2020

View reviewed changes

jreback added the IO Excel read_excel, to_excel label Sep 7, 2020

asishm added 2 commits September 8, 2020 19:23

revert monkeypatch and change impl

29b8cda

Merge branch 'master' of github.com:pandas-dev/pandas into ods

334a528

add comment linking discussion + add whatsnew

21cc458

WillAyd reviewed Sep 9, 2020

View reviewed changes

WillAyd reviewed Sep 10, 2020

View reviewed changes

jreback reviewed Sep 11, 2020

View reviewed changes

whatsnew formatting fix

0e6b2e1

jreback added this to the 1.2 milestone Sep 13, 2020

jreback approved these changes Sep 13, 2020

View reviewed changes

jreback merged commit 88bc2e4 into pandas-dev:master Sep 13, 2020

simonjayhawkins added the Still Needs Manual Backport label Sep 14, 2020

simonjayhawkins modified the milestones: 1.2, 1.1.3 Sep 14, 2020

simonjayhawkins pushed a commit to simonjayhawkins/pandas that referenced this pull request Sep 14, 2020

Backport PR pandas-dev#36175: BUG: read_excel for ods files raising U…

988ae2e

…nboundLocalError in certain cases

simonjayhawkins mentioned this pull request Sep 14, 2020

Backport PR #36175: BUG: read_excel for ods files raising UnboundLocalError in certain cases #36355

Merged

simonjayhawkins removed the Still Needs Manual Backport label Sep 14, 2020

jreback pushed a commit that referenced this pull request Sep 14, 2020

Backport PR #36175: BUG: read_excel for ods files raising UnboundLoca…

7361ccb

…lError in certain cases (#36355) Co-authored-by: Asish Mahapatra <[email protected]>

simonjayhawkins added a commit to simonjayhawkins/pandas that referenced this pull request Sep 14, 2020

DOC: move release note for pandas-dev#36175

8d1a0fa

simonjayhawkins mentioned this pull request Sep 14, 2020

DOC: move release note for #36175 #36363

Closed

simonjayhawkins added a commit to simonjayhawkins/pandas that referenced this pull request Sep 15, 2020

DOC: move release note for pandas-dev#36175 (pt1)

e2039ce

simonjayhawkins added a commit to simonjayhawkins/pandas that referenced this pull request Sep 15, 2020

DOC: move release note for pandas-dev#36175 (pt2)

f8203dc

jreback pushed a commit that referenced this pull request Sep 16, 2020

DOC: move release note for #36175 (pt1) (#36378)

98e4a2b

jreback pushed a commit that referenced this pull request Sep 16, 2020

DOC: move release note for #36175 (pt2) (#36379)

15285e7

simonjayhawkins added a commit to simonjayhawkins/pandas that referenced this pull request Sep 16, 2020

Backport PR pandas-dev#36378: DOC: move release note for pandas-dev#3…

9f91334

…6175 (pt1)

rhshadrach pushed a commit to rhshadrach/pandas that referenced this pull request Sep 17, 2020

DOC: move release note for pandas-dev#36175 (pt1) (pandas-dev#36378)

b82c7d1

rhshadrach pushed a commit to rhshadrach/pandas that referenced this pull request Sep 17, 2020

DOC: move release note for pandas-dev#36175 (pt2) (pandas-dev#36379)

91c37a2

simonjayhawkins added a commit that referenced this pull request Sep 18, 2020

Backport PR #36378: DOC: move release note for #36175 (pt1) (#36398)

d5e2333

kesmit13 pushed a commit to kesmit13/pandas that referenced this pull request Nov 2, 2020

BUG: read_excel for ods files raising UnboundLocalError in certain ca…

1ea8e82

…ses (pandas-dev#36175)

kesmit13 pushed a commit to kesmit13/pandas that referenced this pull request Nov 2, 2020

DOC: move release note for pandas-dev#36175 (pt1) (pandas-dev#36378)

910e08a

kesmit13 pushed a commit to kesmit13/pandas that referenced this pull request Nov 2, 2020

DOC: move release note for pandas-dev#36175 (pt2) (pandas-dev#36379)

1b4e49d

asishm deleted the ods branch January 5, 2021 09:37

		super().__init__(filepath_or_buffer, storage_options=storage_options)

		def _monkeypatch_odf_element_str(self):

Uh oh!

BUG: read_excel for ods files raising UnboundLocalError in certain cases #36175

BUG: read_excel for ods files raising UnboundLocalError in certain cases #36175

Uh oh!

Conversation

asishm commented Sep 6, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

asishm Sep 8, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

WillAyd commented Sep 9, 2020

Uh oh!

asishm commented Sep 9, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jreback left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jreback commented Sep 13, 2020

Uh oh!

simonjayhawkins commented Sep 14, 2020

Uh oh!

jreback commented Sep 14, 2020

Uh oh!

simonjayhawkins commented Sep 14, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

simonjayhawkins commented Sep 14, 2020

Uh oh!

Uh oh!

asishm commented Sep 6, 2020 •

edited

Loading

asishm Sep 8, 2020 •

edited

Loading

simonjayhawkins commented Sep 14, 2020 •

edited

Loading